Preface

For the final exam/project we will develop classification models using several approaches and compare their performance on a new dataset – so-called “Census Income” from UCI ML. It is available at UCI ML web site, but so that we are not at the mercy of UCI ML availability, there is also a local copy of it in our website in Canvas as a zip-archive of all associated files. Among other things, the description for this dataset also presents performance (prediction accuracy) observed by the dataset providers using variety of modeling techniques – this supplies a context for the errors of the models we will develop here.

Please note that the original data has been split up into training and test subsets, but there doesn’t seem to be anything particular about that split, so we might want to pool those two datasets together and split them into training and test as necessary ourselves. As you do that, please check that the attribute levels are consistent between those two files. For instance, the categorized income levels are indicated using slightly different notation in their training and test data. By now it should be quite straightforward for you to correct that when you pool them together.

Also, please note that there is non-negligible number of rows with missing values that for most analyses cannot be included without modification in the computation. Please decide how you want to handle them and proceed accordingly. The simplest and perfectly acceptable approach would be to exclude those observations from the rest of the analyses, but if you have time and inclination to investigate the impact of imputing them by various means, you are welcome to try.

Attribute called “final weight” in the dataset description represents demographic weighting of these observations. Please disregard it for the purposes of this assignment.

Additionally, several attributes in this dataset are categorical variables with more than two levels (e.g. native country, occupation, etc.). Please make sure to translate them into corresponding sets of dummy indicator variables for the methods that require such conversion (e.g. PCA) – R function model.matrix can be convenient for this, instead of generating those 0/1 indicators for each level of the factor manually (which is still perfectly fine). Some of those multi-level factors contain very sparsely populated categories – e.g. occupation “Armed-Forces” or work class “Never-worked” – it is your call whether you want to keep those observations in the data or exclude also on the basis that there is not enough data to adequately capture the impact of those categories. Feel free to experiment away!

Among the multi-level categorical attributes, native country attribute has the largest number of levels – several folds higher than any other attribute in this dataset – some of which have relatively few observations. This associated increase in dimensionality of the data may not be accompanied by a corresponding gain of resolution – e.g. would we expect this data to support the difference in income between descendants from Peru and Nicaragua, for example, or from Cambodia and Laos? Please feel free to evaluate the impact of inclusion and/or omission of this attribute in/from the model and/or discretizing it differently (e.g. US/non-US, etc.).

Lastly, the size of this dataset can make some of the modeling techniques run slower than what we were typically encountering in this class. You may find it helpful to do some of the exploration and model tuning on multiple random samples of smaller size as you decide on useful ranges of parameters/modeling choices, and then only perform a final run of fully debugged and working code on the full dataset.

#Prepare the data
# combine both the datasets 
setwd("/Users/RaviRani/Documents/Harvard-Extension/CSCI E-63/finalexam")
traindata<-read.table("adult.data.1",sep=",",header=FALSE,quote="",stringsAsFactors=TRUE)
ncol(traindata)
## [1] 15
colnames(traindata) <- c("age","workclass","fnlwgt","education","education_num","marital_status","occupation","relationship","race","sex","capital_gain","capital_loss","hours_per_week","native_country","salary")
testdata<-read.table("adult.test",sep=",",header=FALSE,quote="",stringsAsFactors=TRUE)
colnames(testdata) <- c("age","workclass","fnlwgt","education","education_num","marital_status","occupation","relationship","race","sex","capital_gain","capital_loss","hours_per_week","native_country","salary")
ncol(testdata)
## [1] 15
testdata$salary = ifelse(grepl("( <=50K.)",testdata$salary)," <=50K"," >50K")
#testdata$salary[testdata$salary == " <=50K."]<-" <=50K"
#testdata[salary == " >50K."]=" >50K"

#head(traindata)
#head(testdata)
# remove 'final weight' attribute
merged.data <- rbind(traindata[,-3], testdata[,-3])
table(merged.data$salary)
## 
##  <=50K   >50K 
##  37154  11687
nrow(traindata)
## [1] 32560
nrow(testdata)
## [1] 16281
nrow(merged.data)
## [1] 48841
class(merged.data)
## [1] "data.frame"
head(merged.data)
##   age         workclass  education education_num      marital_status
## 1  39         State-gov  Bachelors            13       Never-married
## 2  50  Self-emp-not-inc  Bachelors            13  Married-civ-spouse
## 3  38           Private    HS-grad             9            Divorced
## 4  53           Private       11th             7  Married-civ-spouse
## 5  28           Private  Bachelors            13  Married-civ-spouse
## 6  37           Private    Masters            14  Married-civ-spouse
##           occupation   relationship   race     sex capital_gain
## 1       Adm-clerical  Not-in-family  White    Male         2174
## 2    Exec-managerial        Husband  White    Male            0
## 3  Handlers-cleaners  Not-in-family  White    Male            0
## 4  Handlers-cleaners        Husband  Black    Male            0
## 5     Prof-specialty           Wife  Black  Female            0
## 6    Exec-managerial           Wife  White  Female            0
##   capital_loss hours_per_week native_country salary
## 1            0             40  United-States  <=50K
## 2            0             13  United-States  <=50K
## 3            0             40  United-States  <=50K
## 4            0             40  United-States  <=50K
## 5            0             40           Cuba  <=50K
## 6            0             40  United-States  <=50K
merged.data[merged.data == " ?"]=NA
merged.data$native_country<-factor(merged.data$native_country)
merged.data$workclass<-factor(merged.data$workclass)
merged.data$occupation<-factor(merged.data$occupation)
merged.data$occupation<-factor(merged.data$occupation)



#class(salary)
#attach(merged.data)

# after removing "?" with NA
head(merged.data)
##   age         workclass  education education_num      marital_status
## 1  39         State-gov  Bachelors            13       Never-married
## 2  50  Self-emp-not-inc  Bachelors            13  Married-civ-spouse
## 3  38           Private    HS-grad             9            Divorced
## 4  53           Private       11th             7  Married-civ-spouse
## 5  28           Private  Bachelors            13  Married-civ-spouse
## 6  37           Private    Masters            14  Married-civ-spouse
##           occupation   relationship   race     sex capital_gain
## 1       Adm-clerical  Not-in-family  White    Male         2174
## 2    Exec-managerial        Husband  White    Male            0
## 3  Handlers-cleaners  Not-in-family  White    Male            0
## 4  Handlers-cleaners        Husband  Black    Male            0
## 5     Prof-specialty           Wife  Black  Female            0
## 6    Exec-managerial           Wife  White  Female            0
##   capital_loss hours_per_week native_country salary
## 1            0             40  United-States  <=50K
## 2            0             13  United-States  <=50K
## 3            0             40  United-States  <=50K
## 4            0             40  United-States  <=50K
## 5            0             40           Cuba  <=50K
## 6            0             40  United-States  <=50K
# remove rows with NA's - we will be using this data set for our calculations
noNAData=na.omit(merged.data)
noNAData$native_country<-factor(noNAData$native_country)
noNAData$workclass<-factor(noNAData$workclass)
noNAData$occupation<-factor(noNAData$occupation)
noNAData$occupation<-factor(noNAData$occupation)

#Normalize the numeric variables 
num.vars <- sapply(noNAData, is.numeric)
noNAData[num.vars] <- lapply(noNAData[num.vars], scale)

missmap(noNAData, main = "Missing values vs observed")

attach(noNAData)

is.factor(workclass)
## [1] TRUE
is.factor(race)
## [1] TRUE
is.factor(sex)
## [1] TRUE
is.factor(marital_status)
## [1] TRUE
is.factor(occupation)
## [1] TRUE
is.factor(education)
## [1] TRUE
is.factor(relationship)
## [1] TRUE
contrasts(workclass)
##                    Local-gov  Private  Self-emp-inc  Self-emp-not-inc
##  Federal-gov               0        0             0                 0
##  Local-gov                 1        0             0                 0
##  Private                   0        1             0                 0
##  Self-emp-inc              0        0             1                 0
##  Self-emp-not-inc          0        0             0                 1
##  State-gov                 0        0             0                 0
##  Without-pay               0        0             0                 0
##                    State-gov  Without-pay
##  Federal-gov               0            0
##  Local-gov                 0            0
##  Private                   0            0
##  Self-emp-inc              0            0
##  Self-emp-not-inc          0            0
##  State-gov                 1            0
##  Without-pay               0            1
contrasts(race)
##                      Asian-Pac-Islander  Black  Other  White
##  Amer-Indian-Eskimo                   0      0      0      0
##  Asian-Pac-Islander                   1      0      0      0
##  Black                                0      1      0      0
##  Other                                0      0      1      0
##  White                                0      0      0      1
contrasts(sex)
##          Male
##  Female     0
##  Male       1
contrasts(marital_status)
##                         Married-AF-spouse  Married-civ-spouse
##  Divorced                               0                   0
##  Married-AF-spouse                      1                   0
##  Married-civ-spouse                     0                   1
##  Married-spouse-absent                  0                   0
##  Never-married                          0                   0
##  Separated                              0                   0
##  Widowed                                0                   0
##                         Married-spouse-absent  Never-married  Separated
##  Divorced                                   0              0          0
##  Married-AF-spouse                          0              0          0
##  Married-civ-spouse                         0              0          0
##  Married-spouse-absent                      1              0          0
##  Never-married                              0              1          0
##  Separated                                  0              0          1
##  Widowed                                    0              0          0
##                         Widowed
##  Divorced                     0
##  Married-AF-spouse            0
##  Married-civ-spouse           0
##  Married-spouse-absent        0
##  Never-married                0
##  Separated                    0
##  Widowed                      1
contrasts(occupation)
##                     Armed-Forces  Craft-repair  Exec-managerial
##  Adm-clerical                  0             0                0
##  Armed-Forces                  1             0                0
##  Craft-repair                  0             1                0
##  Exec-managerial               0             0                1
##  Farming-fishing               0             0                0
##  Handlers-cleaners             0             0                0
##  Machine-op-inspct             0             0                0
##  Other-service                 0             0                0
##  Priv-house-serv               0             0                0
##  Prof-specialty                0             0                0
##  Protective-serv               0             0                0
##  Sales                         0             0                0
##  Tech-support                  0             0                0
##  Transport-moving              0             0                0
##                     Farming-fishing  Handlers-cleaners  Machine-op-inspct
##  Adm-clerical                     0                  0                  0
##  Armed-Forces                     0                  0                  0
##  Craft-repair                     0                  0                  0
##  Exec-managerial                  0                  0                  0
##  Farming-fishing                  1                  0                  0
##  Handlers-cleaners                0                  1                  0
##  Machine-op-inspct                0                  0                  1
##  Other-service                    0                  0                  0
##  Priv-house-serv                  0                  0                  0
##  Prof-specialty                   0                  0                  0
##  Protective-serv                  0                  0                  0
##  Sales                            0                  0                  0
##  Tech-support                     0                  0                  0
##  Transport-moving                 0                  0                  0
##                     Other-service  Priv-house-serv  Prof-specialty
##  Adm-clerical                   0                0               0
##  Armed-Forces                   0                0               0
##  Craft-repair                   0                0               0
##  Exec-managerial                0                0               0
##  Farming-fishing                0                0               0
##  Handlers-cleaners              0                0               0
##  Machine-op-inspct              0                0               0
##  Other-service                  1                0               0
##  Priv-house-serv                0                1               0
##  Prof-specialty                 0                0               1
##  Protective-serv                0                0               0
##  Sales                          0                0               0
##  Tech-support                   0                0               0
##  Transport-moving               0                0               0
##                     Protective-serv  Sales  Tech-support  Transport-moving
##  Adm-clerical                     0      0             0                 0
##  Armed-Forces                     0      0             0                 0
##  Craft-repair                     0      0             0                 0
##  Exec-managerial                  0      0             0                 0
##  Farming-fishing                  0      0             0                 0
##  Handlers-cleaners                0      0             0                 0
##  Machine-op-inspct                0      0             0                 0
##  Other-service                    0      0             0                 0
##  Priv-house-serv                  0      0             0                 0
##  Prof-specialty                   0      0             0                 0
##  Protective-serv                  1      0             0                 0
##  Sales                            0      1             0                 0
##  Tech-support                     0      0             1                 0
##  Transport-moving                 0      0             0                 1
contrasts(education)
##                11th  12th  1st-4th  5th-6th  7th-8th  9th  Assoc-acdm
##  10th             0     0        0        0        0    0           0
##  11th             1     0        0        0        0    0           0
##  12th             0     1        0        0        0    0           0
##  1st-4th          0     0        1        0        0    0           0
##  5th-6th          0     0        0        1        0    0           0
##  7th-8th          0     0        0        0        1    0           0
##  9th              0     0        0        0        0    1           0
##  Assoc-acdm       0     0        0        0        0    0           1
##  Assoc-voc        0     0        0        0        0    0           0
##  Bachelors        0     0        0        0        0    0           0
##  Doctorate        0     0        0        0        0    0           0
##  HS-grad          0     0        0        0        0    0           0
##  Masters          0     0        0        0        0    0           0
##  Preschool        0     0        0        0        0    0           0
##  Prof-school      0     0        0        0        0    0           0
##  Some-college     0     0        0        0        0    0           0
##                Assoc-voc  Bachelors  Doctorate  HS-grad  Masters
##  10th                  0          0          0        0        0
##  11th                  0          0          0        0        0
##  12th                  0          0          0        0        0
##  1st-4th               0          0          0        0        0
##  5th-6th               0          0          0        0        0
##  7th-8th               0          0          0        0        0
##  9th                   0          0          0        0        0
##  Assoc-acdm            0          0          0        0        0
##  Assoc-voc             1          0          0        0        0
##  Bachelors             0          1          0        0        0
##  Doctorate             0          0          1        0        0
##  HS-grad               0          0          0        1        0
##  Masters               0          0          0        0        1
##  Preschool             0          0          0        0        0
##  Prof-school           0          0          0        0        0
##  Some-college          0          0          0        0        0
##                Preschool  Prof-school  Some-college
##  10th                  0            0             0
##  11th                  0            0             0
##  12th                  0            0             0
##  1st-4th               0            0             0
##  5th-6th               0            0             0
##  7th-8th               0            0             0
##  9th                   0            0             0
##  Assoc-acdm            0            0             0
##  Assoc-voc             0            0             0
##  Bachelors             0            0             0
##  Doctorate             0            0             0
##  HS-grad               0            0             0
##  Masters               0            0             0
##  Preschool             1            0             0
##  Prof-school           0            1             0
##  Some-college          0            0             1
contrasts(relationship)
##                  Not-in-family  Other-relative  Own-child  Unmarried  Wife
##  Husband                     0               0          0          0     0
##  Not-in-family               1               0          0          0     0
##  Other-relative              0               1          0          0     0
##  Own-child                   0               0          1          0     0
##  Unmarried                   0               0          0          1     0
##  Wife                        0               0          0          0     1
# Take a back up
noNAData.bk<-noNAData
#data frame with factors converted into numeric
noNAData.num<-noNAData
noNAData.num[,'workclass']=as.numeric(as.integer(as.factor(noNAData[,'workclass'])))
noNAData.num[,'education']=as.numeric(as.integer(as.factor(noNAData[,'education'])))
noNAData.num[,'marital_status']=as.numeric(as.integer(as.factor(noNAData[,'marital_status'])))
noNAData.num[,'occupation']=as.numeric(as.integer(as.factor(noNAData[,'occupation'])))
noNAData.num[,'relationship']=as.numeric(as.integer(as.factor(noNAData[,'relationship'])))
noNAData.num[,'race']=as.numeric(as.character(as.integer(noNAData[,'race'])))
noNAData.num[,'sex']=as.numeric(as.character(as.integer(noNAData[,'sex'])))
noNAData.num[,'native_country']=as.numeric(as.integer(as.factor(noNAData[,'native_country'])))

The above code prepares the data for analysis below. first we read data from both data sets adult.data and adult.test . then the data is merged to a data frame. Then NA’s are removed from the data. Then the fnlwgt column is removed based on the preface comments. The a test is made to check whether there are any empty values in the data frame.


Problem 1: univariate and unsupervised analysis (20 points)

Download and read “Census Income” data into R and prepare graphical and numerical summaries of it: e.g. histograms of continuous attributes, contingency tables of categorical variables, scatterplots of continuous attributes with some of the categorical variables indicated by color/symbol shape, etc. Perform principal components analysis of this data (do you need to scale it prior to that? how would you represent multilevel categorical attributes to be used as inputs for PCA?) and plot observations in the space of the first few principal components with subjects’ gender and/or categorized income indicated by color/shape of the symbol. Perform univariate assessment of associations between outcome we will be modeling and each of the attributes (e.g. t-test or logistic regression for continuous attributes, contingency tables/Fisher exact test/\(\chi^2\) test for categorical attributes). Summarize your observations from these assessments: does it appear that there is association between outcome and predictors? Which predictors seem to be more/less relevant?


The continous attributes are: age,education-num,capital-gain,capital-loss,and hours-per-week and categorical attributes are: workclass,education,marital-status,occupation,relationship,race,sex,and native-country

Now we will draw histograms of continuous attributes, contingency tables of categorical variables.


# analyze raw data

qplot(age, geom="histogram",na.rm = TRUE) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(education_num, geom="histogram",na.rm = TRUE) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(capital_gain, geom="histogram",na.rm = TRUE) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(capital_loss, geom="histogram",na.rm = TRUE) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(hours_per_week, geom="histogram",na.rm = TRUE) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


The histogram plots above of continous attributes tells that the age is in the range of 5 to 90 years. The capital gain and capital loss are 0 in most cases.Most of the people work 40 hours in a week. Now we will do contingency table for categorical attributes.


table(sex)
## sex
##  Female    Male 
##   14694   30527
table(education)
## education
##          10th          11th          12th       1st-4th       5th-6th 
##          1223          1619           577           222           449 
##       7th-8th           9th    Assoc-acdm     Assoc-voc     Bachelors 
##           823           676          1507          1959          7570 
##     Doctorate       HS-grad       Masters     Preschool   Prof-school 
##           544         14783          2514            72           785 
##  Some-college 
##          9898
table(workclass)
## workclass
##       Federal-gov         Local-gov           Private      Self-emp-inc 
##              1406              3100             33306              1646 
##  Self-emp-not-inc         State-gov       Without-pay 
##              3796              1946                21
table(marital_status)
## marital_status
##               Divorced      Married-AF-spouse     Married-civ-spouse 
##                   6297                     32                  21055 
##  Married-spouse-absent          Never-married              Separated 
##                    552                  14597                   1411 
##                Widowed 
##                   1277
table(occupation)
## occupation
##       Adm-clerical       Armed-Forces       Craft-repair 
##               5540                 14               6020 
##    Exec-managerial    Farming-fishing  Handlers-cleaners 
##               5984               1480               2046 
##  Machine-op-inspct      Other-service    Priv-house-serv 
##               2969               4808                232 
##     Prof-specialty    Protective-serv              Sales 
##               6008                976               5408 
##       Tech-support   Transport-moving 
##               1420               2316
table(relationship)
## relationship
##         Husband   Not-in-family  Other-relative       Own-child 
##           18666           11702            1348            6626 
##       Unmarried            Wife 
##            4788            2091
table(race)
## race
##  Amer-Indian-Eskimo  Asian-Pac-Islander               Black 
##                 435                1303                4228 
##               Other               White 
##                 353               38902

The above contingency table shows the distribution of number of observations across various categories. Next we will do the scatter plots of some continous attributes with categorical attributes


# The following scatterplot will do the plot of education and education_num

ggplot(merged.data, aes(x=education, y=education_num, shape=education, color=sex)) +
  geom_point()+scale_shape_manual(values=seq(0,15))

# The following scatterplot will do the plot of hours_per_week and education_num
ggplot(merged.data, aes(x=education_num, y=hours_per_week, color=education)) +
  geom_point()

# The following scatterplot will do the plot of hours_per_week and marital_status
ggplot(merged.data, aes(x=marital_status, y=hours_per_week, color=marital_status)) +
  geom_point()

# The following scatterplot will do the plot of capital_gain and education categorized by sex
ggplot(merged.data, aes(x=capital_gain, y=education, color=sex)) +
  geom_point()

# The regression model doen below will show the correlation between the response variable with the independent variable(s)
#summary(lm(as.numeric(salary)~.,data=merged.data,na.action=na.omit))
#outModel<-model.matrix(~ sex + education+workclass, data=merged.data, contrasts.arg=list(sex=diag(nlevels(sex)), education=diag(nlevels(education)),workclass=diag(nlevels(workclass)),marital_status=diag(nlevels(marital_status)),occupation=diag(nlevels(occupation)),relationship=diag(nlevels(relationship)),race=diag(nlevels(race)),native_country=diag(nlevels(native_country))))
#PCA rendition of untransformed data
modelOut<-model.matrix(salary ~ ., data = merged.data)

pca.out<-prcomp(modelOut[ , apply(modelOut, 2, var) != 0],na.rm = TRUE,scale=T)
#pca.out<-prcomp(model.matrix(salary ~ ., data = merged.data),na.rm = TRUE)
#pca.out
#center and scale refers to respective mean and standard deviation of the variables that are used for normalization prior to implementing PCA

#outputs the mean of variables
#pca.out$center

#outputs the standard deviation of variables
#pca.out$scale

#rotation measure provides the principal component loading. Each column of rotation matrix contains the principal component loading vector.
#pca.out$rotation

#compute the principal component score vector
dim(pca.out$x)
## [1] 45221    94
biplot(pca.out, scale = 0)

# plot of PCA results  for PC1 & PC2
plot(pca.out$x[,1:2])

#Attributes of  PC1  in decreasing order
sort(pca.out$rotation[,1]^2,decreasing=TRUE,n=10)
##         marital_status Married-civ-spouse 
##                          0.12100919885903 
##              marital_status Never-married 
##                          0.09572314053910 
##                             education_num 
##                          0.09337405025801 
##                                       age 
##                          0.06515874947188 
##                    relationship Own-child 
##                          0.06144529658745 
##                         workclass Private 
##                          0.06091603349574 
##                            hours_per_week 
##                          0.04808387377377 
##                 occupation Prof-specialty 
##                          0.03677811283382 
##                  occupation Other-service 
##                          0.03531619989403 
##                                  sex Male 
##                          0.03529978662821 
##                       education Bachelors 
##                          0.02459180553456 
##                         education Masters 
##                          0.02255568018513 
##                occupation Exec-managerial 
##                          0.02253853420721 
##                                race White 
##                          0.02252612874413 
##                workclass Self-emp-not-inc 
##                          0.02058712821548 
##                                race Black 
##                          0.02052976885101 
##                    workclass Self-emp-inc 
##                          0.01942700192919 
##                     education Prof-school 
##                          0.01605850012832 
##                              capital_gain 
##                          0.01263615083136 
##                            education 11th 
##                          0.01186491384772 
##                       education Doctorate 
##                          0.01057698940970 
##              occupation Handlers-cleaners 
##                          0.00993882465299 
##                    relationship Unmarried 
##                          0.00980569140425 
##              native_country United-States 
##                          0.00956512200188 
##               relationship Other-relative 
##                          0.00944614789949 
##                     native_country Mexico 
##                          0.00862008068027 
##                       workclass Local-gov 
##                          0.00848714237035 
##                         education HS-grad 
##                          0.00846032225258 
##                relationship Not-in-family 
##                          0.00841755607441 
##                              capital_loss 
##                          0.00710855260751 
##              occupation Machine-op-inspct 
##                          0.00612297248830 
##                         education 5th-6th 
##                          0.00508876482106 
##                    education Some-college 
##                          0.00441212972034 
##                       workclass State-gov 
##                          0.00408633873808 
##                  marital_status Separated 
##                          0.00399939738249 
##                         relationship Wife 
##                          0.00379152235034 
##                            education 12th 
##                          0.00344299663247 
##                occupation Priv-house-serv 
##                          0.00343209168229 
##                             education 9th 
##                          0.00302496996257 
##                         education 1st-4th 
##                          0.00290182694383 
##                                race Other 
##                          0.00276922143725 
##                native_country El-Salvador 
##                          0.00210076923169 
##                       education Preschool 
##                          0.00156687236331 
##      marital_status Married-spouse-absent 
##                          0.00153866700923 
##                  native_country Guatemala 
##                          0.00138905837853 
##                    native_country Jamaica 
##                          0.00126641311593 
##         native_country Dominican-Republic 
##                          0.00121606660867 
##                occupation Protective-serv 
##                          0.00118817263750 
##                         education 7th-8th 
##                          0.00117500190304 
##                      native_country Haiti 
##                          0.00102087930751 
##                native_country Puerto-Rico 
##                          0.00093066160700 
##                    marital_status Widowed 
##                          0.00070749976192 
##                occupation Farming-fishing 
##                          0.00064837501097 
##                    native_country Vietnam 
##                          0.00059069316107 
##                native_country Philippines 
##                          0.00053197777105 
##                       education Assoc-voc 
##                          0.00052937765618 
##                   race Asian-Pac-Islander 
##                          0.00052073556388 
##                      education Assoc-acdm 
##                          0.00046313973386 
##                  native_country Nicaragua 
##                          0.00035422153742 
##                   native_country Columbia 
##                          0.00020785074890 
##                   occupation Craft-repair 
##                          0.00019004500653 
##            native_country Trinadad&Tobago 
##                          0.00017026048074 
##                       native_country Peru 
##                          0.00015662803944 
##                          occupation Sales 
##                          0.00015074115200 
##                    native_country Ecuador 
##                          0.00015060499127 
##                      native_country India 
##                          0.00014989241861 
##                   native_country Honduras 
##                          0.00013518036823 
##                     native_country Taiwan 
##                          0.00013414416108 
##                       native_country Laos 
##                          0.00012113784026 
##                   native_country Portugal 
##                          0.00011843875804 
##                       native_country Iran 
##                          0.00011275413845 
## native_country Outlying-US(Guam-USVI-etc) 
##                          0.00010380241843 
##                     native_country Greece 
##                          0.00009941741569 
##                     native_country Canada 
##                          0.00007309876965 
##                     native_country France 
##                          0.00005355600938 
##                    native_country England 
##                          0.00004079613305 
##                    native_country Ireland 
##                          0.00003040508972 
##                    native_country Hungary 
##                          0.00002612568436 
##                   native_country Thailand 
##                          0.00002107465398 
##                     native_country Poland 
##                          0.00001640129549 
##                      native_country South 
##                          0.00001610077479 
##                    native_country Germany 
##                          0.00001521433284 
##                       native_country Hong 
##                          0.00001294478503 
##                   occupation Tech-support 
##                          0.00001203019074 
##               occupation Transport-moving 
##                          0.00000868826311 
##                      native_country China 
##                          0.00000489630559 
##                   occupation Armed-Forces 
##                          0.00000279644699 
##                   native_country Scotland 
##                          0.00000253223954 
##                      native_country Italy 
##                          0.00000189433954 
##                       native_country Cuba 
##                          0.00000152235966 
##          marital_status Married-AF-spouse 
##                          0.00000104279005 
##                     workclass Without-pay 
##                          0.00000044248292 
##                 native_country Yugoslavia 
##                          0.00000019940312 
##                      native_country Japan 
##                          0.00000004102641
#Attributes of  PC2  in decreasing order
sort(pca.out$rotation[,2]^2,decreasing=TRUE,n=10)
##                             education_num 
##                            0.153675547109 
##         marital_status Married-civ-spouse 
##                            0.092078036806 
##              marital_status Never-married 
##                            0.088055140710 
##              native_country United-States 
##                            0.054495939716 
##                 occupation Prof-specialty 
##                            0.050281611464 
##                                  sex Male 
##                            0.049819305110 
##                     native_country Mexico 
##                            0.045887001723 
##                       education Bachelors 
##                            0.042088471867 
##                relationship Not-in-family 
##                            0.041526113041 
##                         education HS-grad 
##                            0.034926559397 
##                                       age 
##                            0.034309836719 
##                   occupation Craft-repair 
##                            0.033204048328 
##                         education 5th-6th 
##                            0.031625974086 
##                    relationship Own-child 
##                            0.028444045368 
##                         education 7th-8th 
##                            0.024325703148 
##                         education 1st-4th 
##                            0.018482173210 
##                         education Masters 
##                            0.016467539944 
##              occupation Machine-op-inspct 
##                            0.014127625405 
##                occupation Farming-fishing 
##                            0.014000280274 
##               occupation Transport-moving 
##                            0.012804861940 
##                             education 9th 
##                            0.010922920543 
##                       workclass State-gov 
##                            0.009546828386 
##                workclass Self-emp-not-inc 
##                            0.008219859768 
##                            hours_per_week 
##                            0.007931369023 
##                       workclass Local-gov 
##                            0.007499857552 
##                    education Some-college 
##                            0.005859703301 
##                       education Doctorate 
##                            0.004205018821 
##                                race Other 
##                            0.004155373216 
##                       education Preschool 
##                            0.004050356428 
##                native_country El-Salvador 
##                            0.003966561420 
##                     education Prof-school 
##                            0.003746447659 
##                occupation Exec-managerial 
##                            0.003118614137 
##                      education Assoc-acdm 
##                            0.002931217726 
##               relationship Other-relative 
##                            0.002929457015 
##                   occupation Tech-support 
##                            0.002808244597 
##                   native_country Portugal 
##                            0.002781979313 
##                native_country Puerto-Rico 
##                            0.002680038828 
##                  native_country Guatemala 
##                            0.002610164436 
##         native_country Dominican-Republic 
##                            0.002444602570 
##                      native_country Italy 
##                            0.002423624820 
##              occupation Handlers-cleaners 
##                            0.002120961904 
##                          occupation Sales 
##                            0.002077514894 
##                         workclass Private 
##                            0.001657317523 
##                       native_country Cuba 
##                            0.001407572798 
##      marital_status Married-spouse-absent 
##                            0.001346070940 
##                occupation Priv-house-serv 
##                            0.001223218088 
##                                race Black 
##                            0.001193052087 
##                   race Asian-Pac-Islander 
##                            0.000905655231 
##                   native_country Columbia 
##                            0.000877723172 
##                native_country Philippines 
##                            0.000866153938 
##                            education 11th 
##                            0.000825956590 
##                    native_country Ecuador 
##                            0.000745687004 
##                      native_country Haiti 
##                            0.000718631827 
##                     native_country Greece 
##                            0.000705537699 
##                     native_country Poland 
##                            0.000667866961 
##                    relationship Unmarried 
##                            0.000667184016 
##                    native_country Vietnam 
##                            0.000621013539 
##                              capital_gain 
##                            0.000507609683 
##                       native_country Laos 
##                            0.000479030184 
##                  native_country Nicaragua 
##                            0.000451781074 
##                      native_country China 
##                            0.000425993011 
##                     native_country Canada 
##                            0.000392148130 
##                      native_country South 
##                            0.000385627861 
##                 native_country Yugoslavia 
##                            0.000342690549 
##                       education Assoc-voc 
##                            0.000225841682 
##            native_country Trinadad&Tobago 
##                            0.000224789378 
##                  occupation Other-service 
##                            0.000219390054 
##                            education 12th 
##                            0.000189343134 
##                       native_country Peru 
##                            0.000171580535 
##                   native_country Honduras 
##                            0.000170691426 
##                    workclass Self-emp-inc 
##                            0.000156700765 
##                    native_country Ireland 
##                            0.000144313907 
##                       native_country Hong 
##                            0.000136951233 
##                    native_country Germany 
##                            0.000124694610 
##                     workclass Without-pay 
##                            0.000121017646 
##                    marital_status Widowed 
##                            0.000119824884 
##                    native_country Jamaica 
##                            0.000117147656 
##                      native_country Japan 
##                            0.000106859685 
##                         relationship Wife 
##                            0.000105830302 
##                   native_country Thailand 
##                            0.000099156329 
##                occupation Protective-serv 
##                            0.000085429108 
##                   native_country Scotland 
##                            0.000077207882 
##                     native_country Taiwan 
##                            0.000060434185 
##                    native_country England 
##                            0.000055459906 
##                              capital_loss 
##                            0.000041399485 
##                    native_country Hungary 
##                            0.000038010100 
##                       native_country Iran 
##                            0.000035831003 
##                      native_country India 
##                            0.000032878240 
##          marital_status Married-AF-spouse 
##                            0.000027009349 
##                   occupation Armed-Forces 
##                            0.000023954116 
## native_country Outlying-US(Guam-USVI-etc) 
##                            0.000009759741 
##                                race White 
##                            0.000004679474 
##                  marital_status Separated 
##                            0.000002579003 
##                     native_country France 
##                            0.000001181557
#plot observations in the space of the first few principal components with gender
plot(pca.out$x[,1:2],col=c("red","blue")[as.numeric(factor(merged.data$sex))],pch=as.numeric(factor(merged.data$sex)))
legend("topleft",c("male","female"),pch=1:2,col=c("red","blue"),text.col=c("red","blue"))

#plot observations in the space of the first few principal components with salary
plot(pca.out$x[,1:2],col=c("red","blue")[as.numeric(factor(merged.data$salary))],pch=as.numeric(factor(merged.data$salary)))
legend("topleft",c(">50","<=50"),pch=1:2,col=c("red","blue"),text.col=c("red","blue"))


We have used categorical attributes such as education,sex,marital_status etc. by converting them to a dummy variable which is a commonly used method for converting a categorical input variable into a continuous variable.

From the above informtion we can say that in case of PC1 the following are given more weightage . Only top 5 attributes are being taken. ## marital_status Married-civ-spouse ## 0.1240342595329029 ## marital_status Never-married ## 0.0963338001137464 ## education_num ## 0.0868846794820760 ## age ## 0.0648261074083941 ## relationship Own-child ## 0.0613575294742202

From the above informtion we can say that in case of PC2 the following are given more weightage.Only top 5 attributes are being taken. ## education_num ## 0.158821224938 ## marital_status Married-civ-spouse ## 0.084666722129 ## marital_status Never-married ## 0.074427479156 ## sex Male ## 0.056009351029 ## native_country United-States ## 0.051630899464

This tells that significant attributes which could effect salary are : arital_status Married-civ-spouse, marital_status Never-married and education_num. ***

Problem 2: logistic regression (25 points)

Develop logistic regression model of the outcome as a function of multiple predictors in the model. Which variables are significantly associated with the outcome? Test model performance on multiple splits of data into training and test subsets, summarize it in terms of accuracy/error, sensitivity/specificity and compare to the performance of other methods reported in the dataset description.

# logistic regression on whole data
glm.fit=glm(salary~.,data=noNAData,control=glm.control(epsilon = 1e-8, maxit = 50, trace = FALSE),family=binomial)
summary(glm.fit)
## 
## Call:
## glm(formula = salary ~ ., family = binomial, data = noNAData, 
##     control = glm.control(epsilon = 0.00000001, maxit = 50, trace = FALSE))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.0551  -0.5143  -0.1916  -0.0208   3.8486  
## 
## Coefficients: (1 not defined because of singularities)
##                                            Estimate Std. Error z value
## (Intercept)                               -4.114910   0.640615  -6.423
## age                                        0.327399   0.018411  17.783
## workclass Local-gov                       -0.638646   0.092487  -6.905
## workclass Private                         -0.486967   0.077123  -6.314
## workclass Self-emp-inc                    -0.349237   0.101236  -3.450
## workclass Self-emp-not-inc                -1.033051   0.090373 -11.431
## workclass State-gov                       -0.794305   0.101996  -7.788
## workclass Without-pay                     -1.376380   0.787450  -1.748
## education 11th                             0.096182   0.177925   0.541
## education 12th                             0.473130   0.225373   2.099
## education 1st-4th                         -0.488936   0.421520  -1.160
## education 5th-6th                         -0.260869   0.279729  -0.933
## education 7th-8th                         -0.467023   0.198927  -2.348
## education 9th                             -0.258011   0.222550  -1.159
## education Assoc-acdm                       1.406545   0.149984   9.378
## education Assoc-voc                        1.312350   0.144665   9.072
## education Bachelors                        1.966305   0.134958  14.570
## education Doctorate                        2.874407   0.184094  15.614
## education HS-grad                          0.843652   0.131413   6.420
## education Masters                          2.296984   0.143318  16.027
## education Preschool                       -4.963364   3.516624  -1.411
## education Prof-school                      2.905179   0.173474  16.747
## education Some-college                     1.202055   0.133315   9.017
## education_num                                    NA         NA      NA
## marital_status Married-AF-spouse           2.614753   0.484757   5.394
## marital_status Married-civ-spouse          2.296402   0.225347  10.191
## marital_status Married-spouse-absent       0.187137   0.187204   1.000
## marital_status Never-married              -0.422114   0.073214  -5.766
## marital_status Separated                  -0.007293   0.134736  -0.054
## marital_status Widowed                     0.121776   0.129867   0.938
## occupation Armed-Forces                    0.200785   0.891530   0.225
## occupation Craft-repair                    0.060327   0.065591   0.920
## occupation Exec-managerial                 0.780109   0.063219  12.340
## occupation Farming-fishing                -0.989079   0.115658  -8.552
## occupation Handlers-cleaners              -0.684971   0.115127  -5.950
## occupation Machine-op-inspct              -0.287707   0.083668  -3.439
## occupation Other-service                  -0.882103   0.097758  -9.023
## occupation Priv-house-serv                -1.988047   0.754023  -2.637
## occupation Prof-specialty                  0.522651   0.066758   7.829
## occupation Protective-serv                 0.498353   0.103267   4.826
## occupation Sales                           0.262431   0.067504   3.888
## occupation Tech-support                    0.560657   0.090539   6.192
## occupation Transport-moving               -0.089697   0.081313  -1.103
## relationship Not-in-family                 0.513363   0.222933   2.303
## relationship Other-relative               -0.484330   0.202716  -2.389
## relationship Own-child                    -0.580790   0.219134  -2.650
## relationship Unmarried                     0.336870   0.236386   1.425
## relationship Wife                          1.128630   0.085572  13.189
## race Asian-Pac-Islander                    0.913724   0.228724   3.995
## race Black                                 0.374231   0.190972   1.960
## race Other                                 0.510594   0.281169   1.816
## race White                                 0.564560   0.181550   3.110
## sex Male                                   0.717838   0.065202  11.009
## capital_gain                               2.391640   0.065547  36.488
## capital_loss                               0.261820   0.012639  20.716
## hours_per_week                             0.348198   0.016504  21.098
## native_country Canada                     -0.164305   0.583254  -0.282
## native_country China                      -1.512931   0.594731  -2.544
## native_country Columbia                   -2.885063   0.832196  -3.467
## native_country Cuba                       -0.519798   0.601975  -0.863
## native_country Dominican-Republic         -1.711068   0.773840  -2.211
## native_country Ecuador                    -1.121192   0.783790  -1.430
## native_country El-Salvador                -1.245184   0.683252  -1.822
## native_country England                    -0.280294   0.604734  -0.464
## native_country France                     -0.008739   0.698144  -0.013
## native_country Germany                    -0.602822   0.583203  -1.034
## native_country Greece                     -0.918270   0.657657  -1.396
## native_country Guatemala                  -1.236473   0.899057  -1.375
## native_country Haiti                      -0.438156   0.720537  -0.608
## native_country Honduras                   -0.657362   1.250473  -0.526
## native_country Hong                       -1.278501   0.788960  -1.620
## native_country Hungary                    -0.327678   0.794134  -0.413
## native_country India                      -1.150063   0.574227  -2.003
## native_country Iran                       -0.768290   0.653444  -1.176
## native_country Ireland                     0.098273   0.727965   0.135
## native_country Italy                      -0.120160   0.607382  -0.198
## native_country Jamaica                    -0.533367   0.659353  -0.809
## native_country Japan                      -1.033857   0.616636  -1.677
## native_country Laos                       -2.210085   0.997514  -2.216
## native_country Mexico                     -1.379802   0.571595  -2.414
## native_country Nicaragua                  -1.168116   0.841921  -1.387
## native_country Outlying-US(Guam-USVI-etc) -1.571173   1.198919  -1.310
## native_country Peru                       -1.577161   0.815533  -1.934
## native_country Philippines                -0.648910   0.556503  -1.166
## native_country Poland                     -0.712531   0.631853  -1.128
## native_country Portugal                   -0.009100   0.660699  -0.014
## native_country Puerto-Rico                -0.907674   0.617097  -1.471
## native_country Scotland                   -2.060590   0.980842  -2.101
## native_country South                      -2.194233   0.637861  -3.440
## native_country Taiwan                     -1.106390   0.655435  -1.688
## native_country Thailand                   -1.802995   0.840100  -2.146
## native_country Trinadad&Tobago            -2.017349   0.981409  -2.056
## native_country United-States              -0.583529   0.542016  -1.077
## native_country Vietnam                    -2.040055   0.714166  -2.857
## native_country Yugoslavia                 -0.017572   0.781595  -0.022
##                                                       Pr(>|z|)    
## (Intercept)                                0.00000000013328860 ***
## age                                       < 0.0000000000000002 ***
## workclass Local-gov                        0.00000000000501201 ***
## workclass Private                          0.00000000027161593 ***
## workclass Self-emp-inc                                0.000561 ***
## workclass Self-emp-not-inc                < 0.0000000000000002 ***
## workclass State-gov                        0.00000000000000683 ***
## workclass Without-pay                                 0.080482 .  
## education 11th                                        0.588802    
## education 12th                                        0.035788 *  
## education 1st-4th                                     0.246075    
## education 5th-6th                                     0.351039    
## education 7th-8th                                     0.018889 *  
## education 9th                                         0.246318    
## education Assoc-acdm                      < 0.0000000000000002 ***
## education Assoc-voc                       < 0.0000000000000002 ***
## education Bachelors                       < 0.0000000000000002 ***
## education Doctorate                       < 0.0000000000000002 ***
## education HS-grad                          0.00000000013641859 ***
## education Masters                         < 0.0000000000000002 ***
## education Preschool                                   0.158127    
## education Prof-school                     < 0.0000000000000002 ***
## education Some-college                    < 0.0000000000000002 ***
## education_num                                               NA    
## marital_status Married-AF-spouse           0.00000006892560322 ***
## marital_status Married-civ-spouse         < 0.0000000000000002 ***
## marital_status Married-spouse-absent                  0.317483    
## marital_status Never-married               0.00000000814056487 ***
## marital_status Separated                              0.956832    
## marital_status Widowed                                0.348399    
## occupation Armed-Forces                               0.821813    
## occupation Craft-repair                               0.357703    
## occupation Exec-managerial                < 0.0000000000000002 ***
## occupation Farming-fishing                < 0.0000000000000002 ***
## occupation Handlers-cleaners               0.00000000268633981 ***
## occupation Machine-op-inspct                          0.000585 ***
## occupation Other-service                  < 0.0000000000000002 ***
## occupation Priv-house-serv                            0.008374 ** 
## occupation Prof-specialty                  0.00000000000000492 ***
## occupation Protective-serv                 0.00000139378916839 ***
## occupation Sales                                      0.000101 ***
## occupation Tech-support                    0.00000000059243747 ***
## occupation Transport-moving                           0.269981    
## relationship Not-in-family                            0.021292 *  
## relationship Other-relative                           0.016885 *  
## relationship Own-child                                0.008040 ** 
## relationship Unmarried                                0.154132    
## relationship Wife                         < 0.0000000000000002 ***
## race Asian-Pac-Islander                    0.00006472640821200 ***
## race Black                                            0.050041 .  
## race Other                                            0.069375 .  
## race White                                            0.001873 ** 
## sex Male                                  < 0.0000000000000002 ***
## capital_gain                              < 0.0000000000000002 ***
## capital_loss                              < 0.0000000000000002 ***
## hours_per_week                            < 0.0000000000000002 ***
## native_country Canada                                 0.778170    
## native_country China                                  0.010962 *  
## native_country Columbia                               0.000527 ***
## native_country Cuba                                   0.387870    
## native_country Dominican-Republic                     0.027026 *  
## native_country Ecuador                                0.152581    
## native_country El-Salvador                            0.068389 .  
## native_country England                                0.643006    
## native_country France                                 0.990013    
## native_country Germany                                0.301305    
## native_country Greece                                 0.162632    
## native_country Guatemala                              0.169038    
## native_country Haiti                                  0.543123    
## native_country Honduras                               0.599103    
## native_country Hong                                   0.105127    
## native_country Hungary                                0.679883    
## native_country India                                  0.045199 *  
## native_country Iran                                   0.239693    
## native_country Ireland                                0.892614    
## native_country Italy                                  0.843176    
## native_country Jamaica                                0.418558    
## native_country Japan                                  0.093619 .  
## native_country Laos                                   0.026719 *  
## native_country Mexico                                 0.015781 *  
## native_country Nicaragua                              0.165307    
## native_country Outlying-US(Guam-USVI-etc)             0.190030    
## native_country Peru                                   0.053125 .  
## native_country Philippines                            0.243594    
## native_country Poland                                 0.259453    
## native_country Portugal                               0.989011    
## native_country Puerto-Rico                            0.141325    
## native_country Scotland                               0.035655 *  
## native_country South                                  0.000582 ***
## native_country Taiwan                                 0.091407 .  
## native_country Thailand                               0.031860 *  
## native_country Trinadad&Tobago                        0.039824 *  
## native_country United-States                          0.281664    
## native_country Vietnam                                0.004283 ** 
## native_country Yugoslavia                             0.982064    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 50644  on 45220  degrees of freedom
## Residual deviance: 29306  on 45127  degrees of freedom
## AIC: 29494
## 
## Number of Fisher Scoring iterations: 8
# Calculating predictions
Z=predict(glm.fit,type="response")
#assuming we are predicting "1" as <=50K and "0" as >50K.
Z=ifelse(Z >.5,"1","2")

# drawing contingency tabe with the prediction vs real values
tbl<-table(Z,glm.fit$model$salary)
tbl
##    
## Z    <=50K  >50K
##   1   2435  6787
##   2  31578  4421

Based on the regression summary above Significant variables associated with the outcome are :

*age - positively associated

*workclass Self-emp-not-inc - negatively associated

*education Bachelors - positively associated

*education Doctorate - positively associated

*education Masters - positively associated

*education Prof-school - positively associated

*occupation Exec-managerial - positively associated

*occupation Tech-support - positively associated

*relationship Wife - positively associated

*sex Male - positively associated

*capital Gain - positively associated

*capital Loss - positively associated


# recode level with  for salary column
levels(noNAData$salary)
## [1] " <=50K" " >50K"
adult.cmplt<- noNAData

errorLM<-numeric(100)

sensitivityLM<-numeric(100)
specificityLM<-numeric(100)
for ( iTry in 1:100 ) {
# Building the prediction model
ratio = sample(1:nrow(adult.cmplt), size = 0.25*nrow(adult.cmplt))
test.data = adult.cmplt[ratio,] #Test dataset 25% of total
train.data = adult.cmplt[-ratio,] #Train dataset 75% of total

dim(train.data)
dim(test.data)
str(train.data)
# Logistic Regression Model
glm.fit<- glm(salary~., family=binomial(link='logit'),data = train.data)

glm.fit$xlevels[["native_country"]]<-union(glm.fit$xlevels[["native_country"]],levels(test.data$native_country))
#summary(glm.fit) 

glm.pred<- predict(glm.fit, test.data, type = "response")

#hist(glm.pred, breaks=20)
#hist(glm.pred[test.data$salary], col="red", breaks=20, add=TRUE)

# check classification performance
tabl<-table(actual= test.data$salary, predicted= glm.pred>0.5)

dimnames(tabl)[[2]] = c(" <=50K"," >50K")

cm<-confusionMatrix(tabl)
sensitivityLM[iTry]<-cm$byClass['Sensitivity']
specificityLM[iTry]<-cm$byClass['Specificity']
overall <- cm$overall
overall.accuracy <- overall['Accuracy'] 
errorLM[iTry]<-1-overall.accuracy
}
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 13 10 16 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 10 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 12 13 10 16 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 4 10 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 13 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 10 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 3 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 13 16 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 5 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 10 4 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 2 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 3 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 3 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 13 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 5 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 10 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 10 13 7 12 13 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 13 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 10 4 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 3 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] -0.0415 -0.798 -0.1171 0.7907 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 3 3 3 3 3 3 3 3 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 12 10 13 7 13 10 10 8 6 12 ...
##  $ education_num : num [1:33916, 1] -0.438 1.129 1.52 -2.005 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 1 3 3 4 5 3 5 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 6 10 4 8 10 4 1 12 14 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 6 6 2 2 1 4 2 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 3 5 5 5 3 1 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 1 1 1 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -2.0768 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 38 22 38 38 38 38 25 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 1 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 1.0934 -0.798 -0.1171 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 2 10 13 12 13 10 16 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 -1.222 1.129 1.52 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 5 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 4 10 4 4 10 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 1 2 1 1 1 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 5 5 3 2 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.866 1.093 -0.798 -0.117 0.791 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 2 10 13 7 12 13 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.13 -1.22 1.13 1.52 -2 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 3 3 4 3 5 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 8 4 10 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 1 6 6 2 1 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 3 5 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 7 10 16 10 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 4 3 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 8 4 4 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 2 1 1 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 2 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.1171 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 5 3 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 13 12 10 16 10 8 6 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.52 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 4 4 4 4 10 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 1 1 1 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 5 5 5 3 2 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 2 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 38 38 18 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 16 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 0.7907 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 7 12 13 10 16 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 -2.005 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 4 3 5 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 8 4 10 4 4 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 2 1 2 1 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 3 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 1 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -2.0768 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 22 38 38 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 3 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 12 13 16 10 12 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 5 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 10 4 1 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 1 2 1 4 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 3 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 10 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 7 13 10 10 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 4 5 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 8 10 4 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 2 2 1 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 2 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 1.0934 -0.798 -0.1171 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 6 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 2 10 13 13 10 10 10 8 6 ...
##  $ education_num : num [1:33916, 1] 1.13 -1.22 1.13 1.52 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 5 3 3 5 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 10 4 10 1 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 2 1 1 4 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 5 2 5 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 18 38 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] -0.798 -0.117 0.791 1.018 0.261 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 3 3 3 5 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 13 7 12 10 16 10 12 12 11 ...
##  $ education_num : num [1:33916, 1] 1.129 1.52 -2.005 -0.438 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 4 3 3 3 5 5 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 10 4 8 4 4 4 1 5 7 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 6 6 2 1 1 1 4 4 5 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 3 5 3 5 5 3 5 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 1 1 1 2 2 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 0.543 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -2.0768 0.3383 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 5 38 22 38 38 38 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 1 1 1 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 7 12 10 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 4 3 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 8 4 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 2 1 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 5 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 4 10 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] -0.0415 1.0934 -0.1171 0.7907 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 3 3 3 3 5 3 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 12 2 13 7 12 10 16 10 10 8 ...
##  $ education_num : num [1:33916, 1] -0.438 -1.222 1.52 -2.005 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 1 3 3 4 3 3 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 6 6 4 8 4 4 4 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 2 1 1 1 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 3 5 5 3 2 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 2 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -2.0768 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 22 38 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 13 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 10 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 5 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 12 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 10 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 12 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.866 -0.798 0.791 -0.571 0.261 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 6 3 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 7 13 10 16 10 8 6 12 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 -2 1.52 1.13 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 4 5 3 3 3 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 10 8 10 4 4 10 12 14 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 6 2 2 1 1 1 2 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 3 2 3 1 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 1 1 1 2 2 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 1.73 0.543 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -2.0768 0.7547 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 22 38 38 38 18 38 25 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 10 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 2 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 5 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 7 12 13 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 3 5 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 8 4 10 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 2 1 2 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 3 5 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 22 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 12 13 16 6 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 4 10 4 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 5 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 -0.571 -0.1171 -0.6467 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 6 3 3 3 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 13 16 10 10 6 12 2 13 ...
##  $ education_num : num [1:33916, 1] 1.1287 -0.4381 1.5204 -0.0464 1.1287 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 5 3 3 5 3 5 3 1 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 10 1 14 7 12 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 2 1 1 4 1 5 1 5 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 2 5 1 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 2 2 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 1.73 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 0.7547 3.2531 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 18 38 25 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 2 2 2 1 1 1 1 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 13 10 16 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 10 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.866 1.093 -0.117 0.791 0.261 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 6 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 2 13 7 10 16 10 10 8 6 ...
##  $ education_num : num [1:33916, 1] 1.13 -1.22 1.52 -2 1.13 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 3 4 3 3 3 5 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 4 8 4 4 10 1 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 1 6 2 1 1 1 4 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 3 5 3 2 5 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 2 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 0.543 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -2.0768 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 22 38 38 18 38 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.866 1.093 0.791 1.018 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 5 3 3 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 2 7 12 13 10 16 10 8 6 ...
##  $ education_num : num [1:33916, 1] 1.129 -1.222 -2.005 -0.438 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 4 3 5 3 3 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 8 4 10 4 4 10 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 1 2 1 2 1 1 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 5 3 2 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 2 1 2 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -2.0768 0.3383 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 22 38 38 38 38 18 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 12 13 10 8 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 10 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 2 1 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 2 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 10 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 1.0934 -0.798 -0.1171 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 2 10 13 12 13 10 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -1.222 1.129 1.52 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 5 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 4 10 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 1 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 5 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 10 16 8 12 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 4 12 7 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 2 5 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 3 3 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 13 7 12 13 10 10 2 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 5 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 4 8 4 10 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 2 1 2 1 4 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 2 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 10 16 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 12 13 10 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 5 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 10 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 1 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 13 10 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 5 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 10 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 2 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 1 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 7 12 13 10 10 10 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 -2 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 4 3 5 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 8 4 10 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 2 1 2 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 3 5 5 5 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 22 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 0.7907 1.0177 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 5 3 3 3 6 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 7 12 13 10 16 10 10 12 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -2.005 -0.438 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 4 3 5 3 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 8 4 10 4 4 10 1 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 2 1 2 1 1 1 4 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 5 3 2 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 2 1 2 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -2.0768 0.3383 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 22 38 38 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 13 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 10 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.1171 0.7907 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 5 3 3 6 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 13 7 12 10 16 10 10 12 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 1.52 -2.005 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 4 3 3 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 4 8 4 4 4 10 1 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 2 1 1 1 1 4 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 5 3 2 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 2 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -2.0768 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 22 38 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 13 7 12 13 16 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 5 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 4 8 4 10 4 10 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 2 1 2 1 1 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 3 2 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 -0.798 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 13 10 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 5 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 8 10 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 6 6 2 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 -0.798 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 6 3 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 10 10 10 6 12 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 3 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 8 4 10 1 14 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 6 2 1 1 4 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 2 5 1 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 18 38 25 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 10 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 1.0177 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 5 3 6 3 3 5 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 12 13 10 10 6 12 12 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 -0.438 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 5 3 5 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 4 10 10 1 14 5 7 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 1 2 1 4 1 4 5 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 2 5 1 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 0.3383 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 18 38 25 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 1 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 0.2612 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 6 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 10 16 10 10 8 6 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 3 3 5 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 4 10 1 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 1 1 1 4 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 2 5 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 2 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 0.543 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 18 38 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 12 13 10 16 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 13 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 10 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 1.0934 -0.571 0.2612 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 6 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 2 13 10 16 10 10 12 12 2 ...
##  $ education_num : num [1:33916, 1] 1.1287 -1.2215 1.5204 1.1287 -0.0464 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 5 3 3 3 5 5 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 4 10 1 5 7 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 1 1 4 4 5 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 5 3 2 5 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 2 2 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 1.73 0.543 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 0.7547 -0.0781 3.2531 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 18 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 2 2 2 2 1 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 -0.798 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 12 13 16 10 6 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 5 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 8 4 10 4 10 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 6 2 1 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 3 2 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 18 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 13 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 10 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 5 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.1171 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 6 3 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 13 13 10 10 8 6 12 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.52 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 5 3 5 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 4 10 10 1 12 14 7 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 2 1 4 2 1 5 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 2 5 3 1 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 18 38 38 25 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 1 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 10 16 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 4 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 6 10 4 8 4 4 4 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 6 6 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] -0.0415 1.0934 -0.798 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 3 3 3 3 3 5 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 12 2 10 13 7 12 10 10 10 8 ...
##  $ education_num : num [1:33916, 1] -0.438 -1.222 1.129 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 1 3 3 3 4 3 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 6 6 10 4 8 4 4 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 2 1 1 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 3 5 5 2 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 6 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 10 13 7 16 10 8 12 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 12 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 6 2 1 1 2 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 3 2 3 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 12 10 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 5 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 10 16 10 8 6 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 5 3 3 5 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 10 4 4 1 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 2 1 1 4 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 3 5 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 38 38 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 5 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 12 10 10 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 3 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 4 4 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 1 1 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 2 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 7 13 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 5 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 8 10 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 2 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 3 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 22 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 10 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 1.0934 -0.1171 1.0177 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 5 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 2 13 12 13 10 8 12 12 2 ...
##  $ education_num : num [1:33916, 1] 1.129 -1.222 1.52 -0.438 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 5 3 5 5 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 4 4 10 4 12 5 7 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 1 2 1 2 4 5 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 5 5 5 3 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 2 1 2 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 0.3383 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 38 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 1 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 10 8 6 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 10 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 2 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 18 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 5 3 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 12 13 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 3 5 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 4 10 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 1 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 38 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 13 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 5 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 10 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 -0.798 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 5 3 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 12 13 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 3 5 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 8 4 10 4 4 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 6 6 2 1 2 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 5 3 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 1 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 -0.798 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 6 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 13 10 10 10 12 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 5 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 8 10 4 10 1 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 6 6 2 2 1 1 4 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 2 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 1 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 10 13 7 12 10 16 6 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 4 4 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 6 2 1 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 12 13 10 16 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 4 10 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 5 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 -0.798 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 13 10 10 8 6 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 5 3 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 10 4 8 10 4 10 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 6 6 2 2 1 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 2 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 18 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 1.093 -0.798 0.791 -0.571 0.261 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 3 3 3 3 3 3 6 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 2 10 7 13 10 16 10 10 8 6 ...
##  $ education_num : num [1:33916, 1] -1.22 1.13 -2 1.52 1.13 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 3 4 5 3 3 3 5 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 6 10 8 10 4 4 10 1 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 6 2 2 1 1 1 4 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 3 3 3 5 5 3 2 5 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 1 1 1 2 2 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 1.73 0.543 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -2.0768 0.7547 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 22 38 38 38 18 38 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 10 4 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 2 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 4 4 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 3 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 13 7 12 13 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 4 3 5 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 4 8 4 10 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 6 2 1 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 5 3 5 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.798 -0.1171 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 3 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 10 13 13 10 16 10 8 12 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 1.13 1.52 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 5 3 3 5 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 10 4 10 4 4 1 12 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 6 6 2 1 1 4 2 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 5 3 5 3 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 -0.798 0.7907 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 5 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 7 12 13 10 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 -2.005 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 4 3 5 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 8 4 10 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 2 1 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 5 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -2.0768 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 22 38 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 13 7 10 16 10 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 4 3 3 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 4 8 4 4 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 2 1 1 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 3 2 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 2 2 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 22 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.0415 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 2 10 13 7 12 13 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 -1.222 1.129 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 1 3 3 3 4 3 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 6 6 10 4 8 4 10 4 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 2 1 6 6 2 1 2 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.798 -0.1171 0.7907 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 3 5 3 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 13 7 12 13 10 16 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 1.52 -2.005 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 4 3 5 3 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 10 4 8 4 10 4 4 10 1 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 6 6 2 1 2 1 1 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 5 3 5 5 5 3 2 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 1 1 1 2 1 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -0.0781 -2.0768 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 38 22 38 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 -0.798 -0.1171 0.7907 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 3 5 3 3 6 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 7 12 13 16 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -2.005 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 4 3 5 3 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 8 4 10 4 10 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 6 2 1 2 1 1 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 3 5 5 3 2 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 1 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 -2.0768 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 22 38 38 38 18 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 2 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 7 12 10 16 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 4 3 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 8 4 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 2 1 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 3 5 5 3 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 22 38 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.8664 -0.0415 -0.798 -0.1171 1.0177 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 5 3 3 3 5 3 3 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 12 10 13 12 13 10 16 8 6 ...
##  $ education_num : num [1:33916, 1] 1.129 -0.438 1.129 1.52 -0.438 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 1 3 3 3 5 3 3 5 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 4 6 10 4 4 10 4 4 12 14 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 6 6 1 2 1 1 2 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 5 5 5 5 3 3 1 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 1 1 2 1 2 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] -0.147 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -2.3267 -0.0781 -0.0781 -0.0781 0.3383 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 5 38 38 38 38 38 38 25 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 2 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 1.0177 -0.571 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 5 3 3 6 3 3 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 13 16 10 10 8 6 12 ...
##  $ education_num : num [1:33916, 1] 1.1287 1.1287 -0.4381 1.5204 -0.0464 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 5 3 3 5 5 3 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 4 10 4 10 1 12 14 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 2 1 1 4 2 1 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 5 3 2 5 3 1 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 2 2 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 1.73 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 0.3383 0.7547 3.2531 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 18 38 38 25 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 2 2 2 2 1 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 -0.798 0.7907 1.0177 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 3 3 5 3 3 6 3 3 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 7 12 13 16 10 10 8 12 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -2.005 -0.438 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 4 3 5 3 3 5 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 10 8 4 10 4 10 1 12 5 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 6 2 1 2 1 1 4 2 4 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 3 3 5 5 3 2 5 3 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 1 1 2 1 2 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -0.0781 -2.0768 0.3383 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 5 22 38 38 38 18 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 1 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 1.0934 -0.798 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 5 3 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 2 10 13 7 12 13 10 16 ...
##  $ education_num : num [1:33916, 1] 1.13 1.13 -1.22 1.13 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 3 3 3 4 3 5 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 10 4 8 4 10 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 1 6 6 2 1 2 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 3 3 5 3 5 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 1 1 1 2 1 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 5 38 22 38 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 2 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0177 -0.571 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 5 3 3 6 3 5 5 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 12 13 16 10 8 12 13 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -0.438 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 5 3 3 5 5 1 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 4 10 4 10 12 5 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 2 1 1 2 4 5 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 5 5 3 2 3 5 5 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 2 2 2 1 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 1.73 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 0.3383 0.7547 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 38 18 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 2 2 2 2 1 1 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 6 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 10 10 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 10 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 2 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 18 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.798 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 10 16 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.129 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 3 3 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 10 4 8 4 4 4 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 1 1 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 2 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 5 38 22 38 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
## 'data.frame':    33916 obs. of  14 variables:
##  $ age           : num [1:33916, 1] 0.0342 0.8664 -0.0415 1.0934 -0.1171 ...
##  $ workclass     : Factor w/ 7 levels " Federal-gov",..: 6 5 3 3 3 5 3 6 3 3 ...
##  $ education     : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 13 12 13 10 10 8 ...
##  $ education_num : num [1:33916, 1] 1.129 1.129 -0.438 -1.222 1.52 ...
##  $ marital_status: Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 5 3 5 5 ...
##  $ occupation    : Factor w/ 14 levels " Adm-clerical",..: 1 4 6 6 4 4 10 10 1 12 ...
##  $ relationship  : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 1 2 1 4 2 ...
##  $ race          : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 5 5 5 2 5 3 ...
##  $ sex           : Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 2 1 2 1 2 ...
##  $ capital_gain  : num [1:33916, 1] 0.143 -0.147 -0.147 -0.147 -0.147 ...
##  $ capital_loss  : num [1:33916, 1] -0.219 -0.219 -0.219 -0.219 -0.219 ...
##  $ hours_per_week: num [1:33916, 1] -0.0781 -2.3267 -0.0781 -0.0781 -0.0781 ...
##  $ native_country: Factor w/ 40 levels " Cambodia"," Canada",..: 38 38 38 38 38 38 38 18 38 38 ...
##  $ salary        : Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 2 2 2 1 1 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:3620] 15 28 39 52 62 70 78 94 107 129 ...
##   .. ..- attr(*, "names")= chr [1:3620] "15" "28" "39" "52" ...
mean(sensitivityLM)
## [1] 0.8763368
mean(specificityLM)
## [1] 0.7327439
mean(errorLM)
## [1] 0.1529889

After splitting data multiple times with training and test the logistic regression shows that the sensitivity is around 88%. specificity is 73% and accuracy is 85%.

Comparison with RandomForect and SVM is below in subproblem 5 below.


Problem 3: random forest (25 points)

Develop random forest model of the categorized income. Present variable importance plots and comment on relative importance of different attributes in the model. Did attributes showing up as more important in random forest model also appear as significantly associated with the outcome by logistic regression? Test model performance on multiple splits of data into training and test subsets, compare test and out-of-bag error estimates, summarize model performance in terms of accuracy/error, sensitivity/specificity and compare to the performance of other methods reported in the dataset description.

# Random forest on whole data
rfOutput <- randomForest(factor(salary)~.,  importance=TRUE,data=noNAData)
# variable(s) importance plot
varImpPlot(rfOutput)

plot(rfOutput)
legend("top", colnames(rfOutput$err.rate),col=1:6,cex=0.8,fill=1:6)

# test model performance with Random forest 
errorRF<-numeric(100)
errorRFmTry<-numeric(100)
sensitivityRF<-numeric(100)
specificityRF<-numeric(100)
# put all these in a loop
for ( iTry in 1:100 ) {
bTrain <- sample(c(FALSE,TRUE),nrow(noNAData),replace=TRUE)

rfRes <- randomForest(factor(salary)~.,  importance=TRUE,data=noNAData[bTrain,])
rfTbl <- table(factor(noNAData[!bTrain,]$salary),predict(rfRes,newdata=noNAData[!bTrain,]))

rfResmTRy <- randomForest(factor(salary)~.,  importance=TRUE,data=noNAData[bTrain,],mtry=5)
rfTblmTry <- table(factor(noNAData[!bTrain,]$salary),predict(rfResmTRy,newdata=noNAData[!bTrain,]))

cm<-confusionMatrix(rfTbl)
cmmTRy<-confusionMatrix(rfTblmTry)
sensitivityRF[iTry]<-cm$byClass['Sensitivity']
specificityRF[iTry]<-cm$byClass['Specificity']
overall <- cm$overall
overall.accuracy <- overall['Accuracy'] 
errorRF[iTry]<-1-overall.accuracy

overall1 <- cm$overall
overall.accuracy1 <- overall1['Accuracy'] 
errorRFmTry[iTry]<-1-overall.accuracy1

#cm
}
mean(sensitivityRF)
## [1] 0.807315
mean(specificityRF)
## [1] 0.9578385
mean(errorRF)
## [1] 0.1818734
mean(errorRFmTry)
## [1] 0.1818734

Variable Importance plots show that the capital_gain,capitol_loss,marital_status are important. The “MeanDecreaseAccuracy” is the mean decrease of accuracy over all out-of-bag cross validated predictions,

“MeanDecreaseGini” measures the average gain of purity by splits of a given variable. For this data capital_gain,relationship, and age.

The rfOutput shows that class <=50K OOB and >50K behave the same way around 50 decision trees.

After splitting data multiple times with training and test the Random Forest shows that the sensitivity is around 81%. specificity is 96% and accuracy is 82%.

Comparison with Logistic Regression and SVM is below in subproblem 5 below.


Problem 4: SVM (25 points)

Develop SVM model of this data choosing parameters (e.g. choice of kernel, cost, etc.) that appear to yield better performance. Test model performance on multiple splits of data into training and test subsets, summarize model performance in terms of accuracy/error, sensitivity/specificity and compare to the performance of other methods reported in the dataset description.

# run tuning on SVM on the whole data once to get optimal values of cost & gamma
# working on a subset as the whole data is taking a lot of time
tune.out=tune(svm,as.factor(salary) ~ .,data=noNAData,kernel="radial",ranges=list(cost=c( 1,2,5,10,20, 100),gamma=c(0.01,0.02,0.05,0.1,0.2)),scale = FALSE)
 cValue<-tune.out$best.parameters$cost
 gValue<-tune.out$best.parameters$gamma 
 
#run the SVM 
svmfit=svm(as.factor(salary) ~ ., data=noNAData, kernel="radial",cost=cValue,gamma=gValue)
summary(svmfit)
## 
## Call:
## svm(formula = as.factor(salary) ~ ., data = noNAData, kernel = "radial", 
##     cost = cValue, gamma = gValue)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  2 
##       gamma:  0.2 
## 
## Number of Support Vectors:  16514
## 
##  ( 9132 7382 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##   <=50K  >50K
sensitivitySVM<-numeric(100)
specificitySVM<-numeric(100)
errorSVM<-numeric(100)
smp_size <- floor(0.80 * nrow(noNAData))
# put all these in a loop
for ( iTry in 1:100 ) {
train_ind <- sample(seq_len(nrow(noNAData)), size = smp_size,replace = TRUE)
train <- noNAData[train_ind, ]
test <- noNAData[-train_ind, ]

 tune.out=tune(svm,as.factor(salary) ~ .,data=train,kernel="radial",ranges=list(cost=cValue,gamma=gValue),scale = FALSE)
 bestmod=tune.out$best.model
 pOut<-predict(bestmod,test[,-14])
 cValue<-tune.out$best.parameters$cost
 gValue<-tune.out$best.parameters$gamma
 tbl<-table(predict=pOut, truth=test[,14])
 misCal<-1-(tbl[1,1]+tbl[2,2])/sum(tbl)
 
 cm<-confusionMatrix(tbl)
cmmTRy<-confusionMatrix(rfTblmTry)
sensitivitySVM[iTry]<-cm$byClass['Sensitivity']
specificitySVM[iTry]<-cm$byClass['Specificity']
overall <- cm$overall
overall.accuracy <- overall['Accuracy'] 
errorSVM[iTry]<-1-overall.accuracy
}


mean(sensitivitySVM)
## [1] 0.928466
mean(specificitySVM)
## [1] 0.6135384
mean(errorSVM)
## [1] 0.1496539

SVM analysis was taking very long time so only 1000 observation are selected

After splitting data multiple times with training and test the SVM shows that the sensitivity is around 94%. specificity is 58% and accuracy is 85%.

Comparison with Logistic Regression and Random Forest is below in subproblem 5 below. ***

Problem 5: compare logistic regression, random forest and SVM model performance (5 points)

Compare performance of the models developed above (logistic regression, random forest, SVM) in terms of their accuracy, error and sensitivity/specificity. Comment on differences and similarities between them.

#boxplots
#Sensitivity  box plots on RF,SVM,LR models
boxplot(list(LG=sensitivityLM,RF=sensitivityRF,SVM=sensitivitySVM))

# Error on RF,SVM,LR  models
boxplot(list(LG=errorLM,RF=errorSVM,SVM=errorSVM,RFOOB=errorRFmTry))

# specificity on RF,SVM,LR  models
boxplot(list(LG=specificityLM,RF=specificityRF,SVM=specificitySVM))


The Box plots above show the comparison of Logistic Regression (LR), Random Forest(RF) and SVM for error sensitivity,specificity and error.

Sensitivity

SVM is more sensitive out of RF and LR followed by LR. RF is the lowest.

Accuracy

Random Forest is more accurate than RF,LR and SVM. All the three RF,LR and SVM have almost the same accuracy.

specificity

RF has more specificity followed by LR and then SVM


Extra 10 points: KNN model

Develop KNN model for this data, evaluate its performance for different values of \(k\) on different splits of the data into training and test and compare it to the performance of other methods reported in the dataset description. Notice that this dataset includes many categorical variables as well as continuous attributes measured on different scales, so that the distance has to be defined to be meaningful (probably avoiding subtraction of the numerical values of multi-level factors directly or adding differences between untransformed age and capital gain/loss attributes).

# KNN cross done after converting categorical variables to numeric

knn.cross <- tune.knn(x = noNAData.num[,-14], y = as.factor(noNAData.num[,14]), k = 1:50,tunecontrol=tune.control(sampling = "cross"), cross=10)
#Summarize the resampling results set
summary(knn.cross)
## 
## Parameter tuning of 'knn.wrapper':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##   k
##  18
## 
## - best performance: 0.1657858 
## 
## - Detailed performance results:
##     k     error  dispersion
## 1   1 0.1967448 0.003495239
## 2   2 0.1973417 0.005436267
## 3   3 0.1782578 0.002864099
## 4   4 0.1787001 0.004112433
## 5   5 0.1730169 0.004224332
## 6   6 0.1714248 0.003880075
## 7   7 0.1685057 0.003546059
## 8   8 0.1687268 0.004870304
## 9   9 0.1681961 0.003911997
## 10 10 0.1692133 0.005270897
## 11 11 0.1675106 0.003904686
## 12 12 0.1666261 0.003966829
## 13 13 0.1664712 0.003938735
## 14 14 0.1675770 0.003261339
## 15 15 0.1660511 0.004744933
## 16 16 0.1668029 0.004101612
## 17 17 0.1663607 0.004378648
## 18 18 0.1657858 0.004135437
## 19 19 0.1659848 0.004606541
## 20 20 0.1658522 0.004122050
## 21 21 0.1666262 0.004269018
## 22 22 0.1665819 0.003631125
## 23 23 0.1662281 0.003983717
## 24 24 0.1658301 0.004034915
## 25 25 0.1660733 0.004368836
## 26 26 0.1663608 0.004120839
## 27 27 0.1667146 0.003650546
## 28 28 0.1668251 0.004033352
## 29 29 0.1662281 0.004150954
## 30 30 0.1663165 0.004011329
## 31 31 0.1669799 0.004394028
## 32 32 0.1670905 0.004786991
## 33 33 0.1666924 0.004539544
## 34 34 0.1669357 0.005398175
## 35 35 0.1670462 0.004746109
## 36 36 0.1669135 0.004760006
## 37 37 0.1669578 0.004600875
## 38 38 0.1676212 0.004693426
## 39 39 0.1671126 0.004253952
## 40 40 0.1671790 0.004088081
## 41 41 0.1673780 0.004337776
## 42 42 0.1676213 0.004659308
## 43 43 0.1680636 0.005465432
## 44 44 0.1675328 0.005080343
## 45 45 0.1674886 0.005047222
## 46 46 0.1671789 0.005152738
## 47 47 0.1680635 0.004610312
## 48 48 0.1684394 0.005133739
## 49 49 0.1684615 0.005165149
## 50 50 0.1683731 0.005144078
plot(knn.cross)

knn.cross$best.parameters
##     k
## 18 18
#Resampling using bootstraping on full data set
knn.boot <- tune.knn(x = noNAData.num[,-14], y = as.factor(noNAData.num[,14]), k = 1:50,tunecontrol=tune.control(sampling = "boot") )
#Summarize the resampling results set
summary(knn.boot)
## 
## Parameter tuning of 'knn.wrapper':
## 
## - sampling method: bootstrapping 
## 
## - best parameters:
##   k
##  25
## 
## - best performance: 0.1714755 
## 
## - Detailed performance results:
##     k     error  dispersion
## 1   1 0.2016709 0.002422931
## 2   2 0.2029142 0.002264352
## 3   3 0.1977014 0.002008570
## 4   4 0.1945068 0.002525054
## 5   5 0.1884362 0.003111751
## 6   6 0.1853747 0.003288427
## 7   7 0.1817465 0.002807767
## 8   8 0.1803274 0.003520797
## 9   9 0.1787451 0.002952522
## 10 10 0.1780873 0.002814019
## 11 11 0.1767073 0.002983977
## 12 12 0.1759795 0.003140198
## 13 13 0.1752343 0.002673389
## 14 14 0.1755010 0.002525206
## 15 15 0.1741370 0.002931879
## 16 16 0.1738447 0.002871186
## 17 17 0.1727826 0.003006141
## 18 18 0.1731727 0.003040984
## 19 19 0.1726582 0.002688510
## 20 20 0.1721918 0.002458680
## 21 21 0.1722126 0.002193793
## 22 22 0.1723376 0.002601168
## 23 23 0.1721416 0.002631828
## 24 24 0.1721224 0.002853456
## 25 25 0.1714755 0.002636775
## 26 26 0.1722034 0.002282562
## 27 27 0.1718445 0.002610336
## 28 28 0.1719802 0.002455831
## 29 29 0.1718078 0.002448981
## 30 30 0.1716327 0.002396892
## 31 31 0.1716822 0.002487311
## 32 32 0.1717959 0.002553038
## 33 33 0.1715732 0.002383614
## 34 34 0.1715078 0.002419299
## 35 35 0.1718124 0.002129954
## 36 36 0.1718835 0.002259084
## 37 37 0.1715404 0.002050448
## 38 38 0.1719209 0.002037610
## 39 39 0.1721002 0.002163424
## 40 40 0.1719425 0.002022634
## 41 41 0.1717959 0.002073330
## 42 42 0.1718450 0.001768809
## 43 43 0.1718935 0.002089969
## 44 44 0.1716652 0.002381223
## 45 45 0.1721390 0.002356919
## 46 46 0.1721610 0.002248196
## 47 47 0.1720521 0.002178913
## 48 48 0.1722910 0.002329793
## 49 49 0.1721170 0.002284532
## 50 50 0.1718510 0.002154651
plot(knn.boot)

knn.boot$best.parameters
##     k
## 25 25
#Splitting K values
smp_size <- floor(0.80 * nrow(noNAData.num))
train_ind <- sample(seq_len(nrow(noNAData.num)), size = smp_size)

knntrain <- noNAData.num[train_ind, ]
knntest <- noNAData.num[-train_ind, ]

   misCalk<-vector()
  kValues<-vector()

  for (x1 in 1:50){
  knn1.pred <- tune.knn(x = knntrain[,-14],y = as.factor(knntrain[,14]),k = 1:50)
  kValues[x1]<-knn1.pred$best.parameters
  knnOutput<- knn(train = knntrain[,-14],test = knntest[,-14],cl = as.factor(knntrain[,14]),k =knn1.pred$best.parameters)
  knn1Tbl<- table(knnOutput,as.factor(knntest[,14]))
  misCalk[x1]<-1-(knn1Tbl[1,1]+knn1Tbl[2,2])/sum(knn1Tbl)
  }
  #misCalk
  # Mean of the errors
  mean(misCalk)
## [1] 0.1688027
  plot(x=kValues,y=misCalk)


For bootstrap K=26 is the optimal K Value. but for cross validation optimal K value is 19.

After splitting and tuning the K-value several we can see that minimal error is for k=12 & 18.


Extra 15 points: variable importance in SVM

SVM does not appear to provide readily available tools for judging relative importance of different attributes in the model. Please evaluate here an approach similar to that employed by random forest where importance of any given attribute is measured by the decrease in model performance upon randomization of the values for this attribute.

dmy <- dummyVars(" ~ .", data = noNAData, fullRank=T)
trsf <- data.frame(predict(dmy, newdata = noNAData))

#anyNA(trsf)

#split the data into traning and test
splitIndex <- sample(nrow(trsf), floor(0.5*nrow(trsf)))
trainDF <- trsf[ splitIndex,]
testDF  <- trsf[-splitIndex,]

outcomeName <- 'salary...50K'
predictorNames <- setdiff(names(trainDF),outcomeName)


# transform outcome variable to text as this is required in caret for classification 
#>50K=TRUE and <=50K=FALSE
#trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==" <=50K",1,2)
#trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==" <=50K","<=50K",">50K")
#trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==" <=50K",1,2)
# trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==0,"lessthan50K","greaterthank50K")
trainDF[,outcomeName] <- ifelse(trainDF[,outcomeName]==0,"lessthan50K","greaterthank50K")
testDF[,outcomeName] <- ifelse(testDF[,outcomeName]==0,"lessthan50K","greaterthank50K")

trainDF1=na.omit(trainDF)

#trctrl <- trainControl(method = "repeatedcv", classProbs=TRUE, returnResamp='none',summaryFunction=twoClassSummary,repeats = 3)

#svm.tune <- train(x=trainDF[,predictorNames],y= as.factor(trainDF[,outcomeName]),method = "svmRadial",tuneLength = 10,preProc = c("center","scale"), metric="ROC",trControl=trctrl)

trctrl <- trainControl(method = "repeatedcv",  classProbs =  TRUE,number=10,repeats = 3)

svm.tune <- train(salary...50K~.,data=trainDF,method = "svmRadial",tuneLength = 10,preProc = c("center","scale"),trControl=trctrl)
## 
## Attaching package: 'kernlab'
## The following object is masked from 'package:ggplot2':
## 
##     alpha
predictions <- predict(object=svm.tune, testDF[,predictorNames], type='prob')
# This is taken from stackoverflow as provided by the link
GetROC_AUC = function(probs, true_Y){
        # AUC approximation
        # http://stackoverflow.com/questions/4903092/calculate-auc-in-r
        # ty AGS
        probsSort = sort(probs, decreasing = TRUE, index.return = TRUE)
        val = unlist(probsSort$x)
        idx = unlist(probsSort$ix) 
        
        roc_y = true_Y[idx];
        stack_x = cumsum(roc_y == 1)/sum(roc_y == 1)
        stack_y = cumsum(roc_y == 2)/sum(roc_y == 2)   
        
        auc = sum((stack_x[2:length(roc_y)]-stack_x[1:length(roc_y)-1])*stack_y[2:length(roc_y)])
        return(auc)
}

testOutcome <- ifelse(testDF[,outcomeName]=="lessthan50K",1,2)
refAUC <- GetROC_AUC(predictions[[1]],testOutcome )
print(paste('AUC score:', refAUC))
## [1] "AUC score: 0.779934704501859"
# Shuffle predictions for variable importance
AUCShuffle <- NULL
shuffletimes <- 10
 
featuresMeanAUCs <- c()
for (feature in predictorNames) {
        featureAUCs <- c()
        shuffledData <- testDF[,predictorNames]
        for (iter in 1:shuffletimes) {
                shuffledData[,feature]<-sample(shuffledData[,feature],length(shuffledData[,feature]))
                predictions <- predict(object=svm.tune, shuffledData[,predictorNames], type='prob')
               featureAUCs <- c(featureAUCs,GetROC_AUC(predictions[[1]], testDF[,outcomeName]))
        }
        featuresMeanAUCs <- c(featuresMeanAUCs, mean(featureAUCs < refAUC))
}
AUCShuffle <- data.frame('feature'=predictorNames, 'importance'=featuresMeanAUCs)
AUCShuffle <- AUCShuffle[order(AUCShuffle$importance, decreasing=TRUE),]
print(AUCShuffle)
##                                       feature importance
## 1                                         age         NA
## 2                        workclass..Local.gov         NA
## 3                          workclass..Private         NA
## 4                     workclass..Self.emp.inc         NA
## 5                 workclass..Self.emp.not.inc         NA
## 6                        workclass..State.gov         NA
## 7                      workclass..Without.pay         NA
## 8                             education..11th         NA
## 9                             education..12th         NA
## 10                         education..1st.4th         NA
## 11                         education..5th.6th         NA
## 12                         education..7th.8th         NA
## 13                             education..9th         NA
## 14                      education..Assoc.acdm         NA
## 15                       education..Assoc.voc         NA
## 16                       education..Bachelors         NA
## 17                       education..Doctorate         NA
## 18                         education..HS.grad         NA
## 19                         education..Masters         NA
## 20                       education..Preschool         NA
## 21                     education..Prof.school         NA
## 22                    education..Some.college         NA
## 23                              education_num         NA
## 24          marital_status..Married.AF.spouse         NA
## 25         marital_status..Married.civ.spouse         NA
## 26      marital_status..Married.spouse.absent         NA
## 27              marital_status..Never.married         NA
## 28                  marital_status..Separated         NA
## 29                    marital_status..Widowed         NA
## 30                   occupation..Armed.Forces         NA
## 31                   occupation..Craft.repair         NA
## 32                occupation..Exec.managerial         NA
## 33                occupation..Farming.fishing         NA
## 34              occupation..Handlers.cleaners         NA
## 35              occupation..Machine.op.inspct         NA
## 36                  occupation..Other.service         NA
## 37                occupation..Priv.house.serv         NA
## 38                 occupation..Prof.specialty         NA
## 39                occupation..Protective.serv         NA
## 40                          occupation..Sales         NA
## 41                   occupation..Tech.support         NA
## 42               occupation..Transport.moving         NA
## 43                relationship..Not.in.family         NA
## 44               relationship..Other.relative         NA
## 45                    relationship..Own.child         NA
## 46                    relationship..Unmarried         NA
## 47                         relationship..Wife         NA
## 48                   race..Asian.Pac.Islander         NA
## 49                                race..Black         NA
## 50                                race..Other         NA
## 51                                race..White         NA
## 52                                  sex..Male         NA
## 53                               capital_gain         NA
## 54                               capital_loss         NA
## 55                             hours_per_week         NA
## 56                     native_country..Canada         NA
## 57                      native_country..China         NA
## 58                   native_country..Columbia         NA
## 59                       native_country..Cuba         NA
## 60         native_country..Dominican.Republic         NA
## 61                    native_country..Ecuador         NA
## 62                native_country..El.Salvador         NA
## 63                    native_country..England         NA
## 64                     native_country..France         NA
## 65                    native_country..Germany         NA
## 66                     native_country..Greece         NA
## 67                  native_country..Guatemala         NA
## 68                      native_country..Haiti         NA
## 69                   native_country..Honduras         NA
## 70                       native_country..Hong         NA
## 71                    native_country..Hungary         NA
## 72                      native_country..India         NA
## 73                       native_country..Iran         NA
## 74                    native_country..Ireland         NA
## 75                      native_country..Italy         NA
## 76                    native_country..Jamaica         NA
## 77                      native_country..Japan         NA
## 78                       native_country..Laos         NA
## 79                     native_country..Mexico         NA
## 80                  native_country..Nicaragua         NA
## 81 native_country..Outlying.US.Guam.USVI.etc.         NA
## 82                       native_country..Peru         NA
## 83                native_country..Philippines         NA
## 84                     native_country..Poland         NA
## 85                   native_country..Portugal         NA
## 86                native_country..Puerto.Rico         NA
## 87                   native_country..Scotland         NA
## 88                      native_country..South         NA
## 89                     native_country..Taiwan         NA
## 90                   native_country..Thailand         NA
## 91            native_country..Trinadad.Tobago         NA
## 92              native_country..United.States         NA
## 93                    native_country..Vietnam         NA
## 94                 native_country..Yugoslavia         NA
RocImp <- filterVarImp(x = noNAData.bk[, -ncol(noNAData.bk)], y = noNAData.bk$salary)
head(RocImp)
##                  X...50K    X..50K
## age            0.6819298 0.6819298
## workclass      0.5136817 0.5136817
## education      0.5212537 0.5212537
## education_num  0.7140418 0.7140418
## marital_status 0.6421344 0.6421344
## occupation     0.5366588 0.5366588

In Extra 15 points: variable importance in SVM subset of data was done to speed up the submission and to reduce the processing time

Also the Shuffle predictions for variable importance repetitions are reduced from around 500 to 10 . These could effect the out of important variables.

The variable importance algorithinm was adapted from another variable importace steps provided at http://amunategui.github.io/variable-importance-shuffler/ . To compare i used filterVarImp.

The important variable list are arranged in decreasing order of importance.

So we can see that for SVM the following are the important variables. ##1 age
## 2 workclass..Local.gov
## 3 workclass..Private
## 4 workclass..Self.emp.inc
## 5 workclass..Self.emp.not.inc
## 6 workclass..State.gov
## 7 workclass..Without.pay
## 8 education..11th
## 9 education..12th
## 10 education..1st.4th
## 11 education..5th.6th
## 12 education..7th.8th
## 13 education..9th

For Random Forest the first 5 important variables are : capital gain ,capital loss,marital status,occupation and age

The ROC curves of important independent variables w.r.t salary (independent variable) are : age
workclass
education
education_num
marital_status and occupation
***